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Abstract. We study galaxy correlations from samples extracted from the 2dFGRS final release. Statistical prop- 
erties are characterized by studying the nearest neighbor probability density, the conditional density and the 
reduced two-point correlation function. The result is that the conditional density has a power-law behavior in 
redshift space described by an exponent 7 = 0.8 ± 0.2 in the interval from about 1 Mpc/h, the average distance 
between nearest galaxies, up to about 40 Mpc/h, corresponding to radius of the largest sphere contained in the 
samples. These results are consistent with other studies of the conditional density and are useful to clarify the 
subtle role of finite-size effects on the determination of the two-point correlation function in redshift and real 
space. 
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1. Introduction 

The problem of the quantitative characterization of the 
large scale galaxy clustering has been intensively discussed 
in the last years, especially in relation to two new galaxy 
. surveys: the Sloan Digital Sky Survey (SDSS — York 
et al., 2000) and the Two degree Field Galaxy Redshift 
Survey (2dFGRS — Colless et al., 2003). These data rep- 
resent a great improvement for our knowledge of the local 
universe: for example the number of measured redshifts 
has grown of a factor ten with respect to the surveys com- 
pleted in the last two decades. Moreover accurate redshift 
determinations and the multi-bands photometry allow one 
a precise characterization of many parameters and effects 
(e.g. K corrections) which were poorly constrained up to 
few years ago. It should however be noted that for some 
analyses, like the ones we discuss here, a large solid angle 
is also required. This is still not the case for the present 
data, but, for instance, the final release of the SDSS will 
provide a large contiguous angular sky region in the very 
near future. 

In this paper we discuss the analysis of two-point corre- 
lation properties in the 2dFGRS sample. Up to now these 
data were mainly analyzed by studying the reduced cor- 
relation function £(r), in redshift and real space, and its 
Fourier conjugate, the power spectrum (e.g. N'orberg et 
al. 2001, 2002; Tegmark, Hamilton & Xu 2002; Hawkins et 
al. 2003; Madgwick et al. 2003, Basilakos & Plionis 2003, 



Cole et al. 2005). Recently Gaztanaga et al. (2005) present 
new result for the 3-point correlation function measured 
as a function of scale, luminosity and color using the 2dF- 
GRS sample. 

In general, these statistical tools can be affected by 
finite-size effects or luminosity dependent selection effects 
(e.g. Gabrielli et al., 2004) and, by using appropriate 
statistics, one may perform several tests in order to dis- 
entangle different biases. Finite size effects can be very 
important for the determination of correlation properties 
in the regime of large fluctuations, which should be then 
clearly identified in the studies of galaxy samples. It is in 
fact well known that at small scales, observed galaxy struc- 
tures are highly irregular and present two-point power-law 
correlations, in the regime of strong clustering. However 
the search for the "maximum" size of galaxy structures 
and voids, beyond which the distribution becomes essen- 
tially uniform and fluctuations can be considered small 
perturbations with respect to the average density, is still 
an open problem (Tikhonov & Makarov 2003, Hogg et 
al. 2005, Joyce et al. 2005 and see for a recent review 
Baryshev & Teerikorpi 2005). It is evident that from the 
theoretical point of view the understanding of the statis- 
tical characteristics of these structures represents the key 
element to be considered by a physical theory dealing with 
their formation. 

A number of statistical methods can be used to study 
galaxy distribution, the main ones involve the determina- 
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tion of two-point properties although the study of the dis- 
tribution function, containing information on higher order 
correlations, has also been found to be a powerful method 
(e.g. Sivakoff & Saslaw 2005). The primary questions in 
correlation analysis of three dimensional galaxy distribu- 
tions are: (i) what is the value of the correlation exponent 
and (ii) which is the scale where the distribution becomes 
uniform and a crossover to homogeneity can be clearly 
identified ? Such a scale can be defined, for example, to 
be the one beyond which conditional counts of galaxies 
in three dimensional volumes of radius R grow as R 3 . 
Recently Hogg et al. (2005), by considering the properties 
of a deep and complete sample of luminous red galaxies 
extracted from the SDSS survey, found that the transition 
from the strongly correlated regime to the uniform one oc- 
curs at about 70 Mpc/h 1 , which is larger than, for exam- 
ple, results in the CfAl redshift survey where the transi- 
tion was found at about 20 Mpc/h (Davis & Peebles, 1983; 
see Peebles 2001 for a recent discussion). Particularly, they 
have measured the behavior of the conditional density in 
redshift space, finding that the exponent characterizing 
power law correlation is about 7 « 1 (instead of 7 = 1.8 
as measured by Davis & Peebles 1983) up to 20-30 Mpc/h 
and that this is followed by a slow crossover toward ho- 
mogeneity which is reached at about 70 Mpc/h. These 
results are in good agreement with the ones presented in, 
e.g. Sylos Labini et al. (1998) (see Baryshev & Teerikorpi, 
2005 for a recent review) where the same value 7 « 1 was 
found up to 20-30 Mpc/h and where at larger scales, with 
a weaker statistics, an evidence for a compatibility with 
the extension of such a behavior was found. In addition 
Tikhonov, Makarov & Kopylov (2000) found similar re- 
sults up to scales of ~ 30 Mpc/h, and weaker evidences 
for homogeneity at scales larger than 100 Mpc/h. 

In this paper we present results of a correlation anal- 
ysis of the 2dFGRS data studying the behavior of the 
conditional density and other statistics suitable to char- 
acterize properties of distributions with large fluctuations 
and control finite size effects. In Sec. 2 we describe the pro- 
cedure to construct samples which are not biased by the 
luminosity selection in apparent magnitudes (the so-called 
volume limited — VL — samples). In Sec. 3 we consider 
the nearest neighbor probability density for the VL sam- 
ples which allows us a characterization of small scales sta- 
tistical properties. We then turn to the study of large scale 
in Sec. 4 where we discuss the estimation of the conditional 
density and the result obtained in the VL samples. We dis- 
cuss the relation of this statistical tool with the reduced 
two-point correlation function in Sec. 5, where we compare 
our results with previous estimations of the same statis- 
tics, focusing on finite size effects and their implication for 
the interpretation of galaxy correlations. Finally in Sec. 6 
we summarize our results and discuss its relation to other 
studies and we draw our main conclusions. 



1 Note that we use as Hubble constant the value _ffo = 100 h 
km/sec/Mpc where h is 0.4 < ft < 0.7 



2. Volume limited subsamples 

The 2dFGRS is the largest galaxy catalog completed at 
the moment. The Final Release (Colless et al., 2003) con- 
tains more than 220 thousands of precisely measured red- 
shifts of the galaxies located in two strips: about 140 thou- 
sands in the southern galactic pole (SGP), in a strip of 
90° x 15° and about 70 thousands in the strip 75° x 10° 
in northern galactic pole (NGP) In addition the survey 
contains 10 thousands in the random fields which are not 
used in this paper. 

The median redshift of galaxies is z ~ 0.1 and most of 
the galaxies have z < 0.3. The bj magnitude corrected for 
the galactic extinction is limited as 14.0 < bj < 19.45. 

2.1. Selection of subsamples 

To avoid the effect of the irregular edges in the angular 
coordinates, due to the survey geometry, we set the fol- 
lowing limits in right ascension and declination in order 
to get rectangular (in a, 5 coordinates) shape on the sky: 

- SGP: 84° x 9° (-33° < S < -24°, -32° < a < 52°) 

- NGP: 60° x 6° (-4° < <5 < 2°, 150° < a < 210°) 

We select galaxies in the redshift interval 0.01 < z < 
0.3 and with redshift quality parameter such that Q > 3 
in order to have high quality redshifts (see discussion in 
Hawkins et al. 2003). 

We do not use a correction for the redshift complete- 
ness mask and for the fiber collision effects. In fact, com- 
pleteness varies mostly nearby the survey edges which are 
excluded in our sample. We assume that fiber collisions do 
not make a sensible change in the small scales correlation 
properties as we set our lower cut-off to 0.5 Mpc/h which 
is larger than 0.1 Mpc/h used by Hawkins et al. (2003). 

To construct VL subsamples first we compute metric 
distances as 

1 \ c f 1 1 \ 

where we use the standard model parameters fijvf = 0.3 
and Q\ = 0.7. The absolute magnitude can be computed 
as 

M = bj - 5 • log 10 [r(z) • (1 + z)] - Ki{z) - 25. (2) 

To calculate the K-correction Ki(z) (the index i defines 
the galaxy type) we used formulas obtained by Madgwick 
et al. (2002): 

Ki(z) = 2.6z + 4.3z 2 (E/S0) 
K 2 (z) =1.9z + 2.2z 2 (Sa/Sb) 

K 3 (z) = 1.3z + 2.0z 2 (Sc/Sd) (3) 
Ki{z) = 0.9z + 2.3z 2 (Irr) 
K avg (z) —1.9z + 2.7z 2 (average) 

where 1,2,3,4 indexes represent the spectral types of the 
galaxies (in parenthesis in Eq|3J), and the average value 
Kavg{z) is used for the galaxies with the undefined spec- 
tral type. 
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r(z), Mpc, H =100 km/(s*Mpc) 

Fig. 1. The metric distance- absolute magnitude diagram 
for the SGP strip. The boundaries of the SGP400 subsam- 
ple are shown. 



2.2. Definition of volume limited subsamples 

To take into account the selection effect that arises due to 
the 2dFGRS apparent magnitude limits 14 < bj < 19.45, 
one has to consider two limits for the metric distance 
Tmin < r < r max and compute the two correspond- 
ing limits for the absolute magnitude M m i n {r. m i n ) and 
M max (r max ) which represent the lower and the upper limit 
for the galaxies contained in a VL sample. 

To this aim, we select three distance intervals (50-250 
Mpc/h, 100-400 Mpc/h and 150-550 Mpc/h) and com- 
pute the corresponding absolute magnitude limits for each 
of two strips. Thus we get three VL subsamples for the 
Northern hemisphere and three for Southern hemisphere 
whose main parameters are presented Tabled (Note that 
hereafter we set h = 1 unless specified). An example of the 
distance-magnitude limits for the SGP400 sample (which 
indeed is the largest one considered in this paper) is shown 
in Fig.H In Fig|21we show the behavior of the differential 
number counts dN(r)/dr as a function of distance in dif- 
ferent sky areas for the sample SGP400. Particularly we 
put limits respectively at S < -27° (c4), S > -27° (c5), 
a > 189° (c6) and a > 189° (c7). As an example we report 
the best fit for the sample c4, which show an exponent cor- 
responding to a metric dimension larger (D = 3.7) than 
the space dimension. This is a purely finite-size effect cor- 
responding to the large fluctuations still visible at scales 
of order 100 Mpc/h. 



3. Nearest neighbor probability density 

In a stochastic point process the probability uj(r)dr that 
the nearest neighbor to a given particle lies at a distance in 
the range [r, r+dr] can provide a useful characterization of 




Fig. 2. Differential number counts in different sky areas 
(defined in the text) for the SGP400 subsample. As an 
example we report the best fit for the sample c4, which 
show an exponent corresponding to a metric dimension 
larger than the space dimension. This is a purely finite- 
size effect which maybe explained by a presence of the 
large scale fluctuations in the studied region. 



VL sample 


>min 


Tmax 






N g 


SGP250 


50 


250 


-19.5 


-17.8 


14177 


SGP400 


100 


400 


-20.8 


-19.0 


29373 


SGP550 


150 


550 


-21.2 


-19.8 


26289 


NGP250 


50 


250 


-19.5 


-17.8 


12474 


NGP400 


100 


400 


-20.8 


-19.0 


23208 


NGP550 


150 


550 


-21.2 


-19.8 


18030 



Table 1. Main properties of the obtained VL samples: 
Trnim T max are the chosen limits for the metric distance; 
M min , M max are the interval for the absolute magnitude 
and N g is the resulting number of galaxies in each sample. 



small scale statistical properties. This probability density 
satisfies, by definition, the condition 



oj(r)dr = 1. (4) 
According to its definition uj{r) can be simply estimated 

(5) 



to E (r) =N nn (r)/ / N nn (r )dr 



where N nn (r) is the number of points which have their 
nearest neighbors in the range [r, r + dr] . 

The nearest neighbor probability density for a Poisson 
distribution with average density (n), is given by 
(Gabrielli et al. 2004) 



w(r) = 4-7r(n)r exp — 



4ir(n)r c 



(6) 



In Fig. we present an example of the observed uj(r) 
distribution in the VL SGP400, along with an artificial 
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which may enter into the estimators. Then we apply it to 
the case of the VL samples extracted from the 2dFGRS, 
as discussed in the previous section. 

4.1. Conditional density in spheres T*(r) 

The conditional density in spheres T* (r) is defined for an 



r» 



r . h ,oo" 1 M P C 

Fig. 3. Nearest neighbor probability density for the 
SGP400 sample data (squares) and for a Poisson simu- 
lation (circles) in the same volume. The dotted lines are 
the best fit respectively for the anisotropic Poisson dis- 
tribution (Eq0 with 7 = 1.2) and for the Poisson case 
(EqEJ. 

Poisson distribution with the same number of points in 
the same three-dimensional volume. Note that the probed 
scales here are about 0.1 4- 10 Mpc/h. 

For the actual data the average distance between near- 
est galaxies is smaller than for the Poisson case, and this 
is a clear evidence of the presence of small scale correla- 
tions. The exact analytical behavior of ui(r) for the general 
case of a power-law correlated structure is unknown; an 
approximate relation for the simple case of a anisotropic 
Poisson distribution, which present a radial density profile 
decaying as n c (r) ~ r~ Q from its center with exponent a 
(with a > 1.5 — see discussion in Gabrielli et al. 2004), is 
given by 



uj(r) = AnCr 2 7 exp 



A-kC 



,3-7 



(7) 



where 7 = 3 — 2a. This is found to be a good approxima- 
tion in the actual data (see Fig^]). In Tab|21 (see below) 
we report the estimation of the average distance between 
nearest neighbors (defined as r sep — J ruj(r)dr) in the dif- 
ferent samples (Note that similar values have been found 
by Peebles 2001). 



4. Estimation of correlations: 
density 



the conditional 



In general, in a distribution of points with large fluctua- 
tions at some scales, one may determinate two-point cor- 
relations through the estimation of the conditional den- 
sity (see discussion in Gabrielli et al. (2004)). We first 
briefly summarize the main properties of this statistical 
tool stressing the finite size effects and statistical errors 



\\C(r)\\ 



(8) 



This quantity measures the average number of points 
{N(r))p contained in a sphere of volume ||C(r)|| = §7rr 3 
with the condition that the center of the sphere lies on an 
occupied point of the distribution (and (...)p denotes the 
conditional ensemble average). 

Such a quantity can be estimated in a finite sample 
by a volume average (supposing stationarity of the point 
distribution) 



I*(r) 



\\C(r) 



N c (r) 

E 



Nc(r) \\C(r)\\ 



(9) 



where N c (r) — the number of points (centers) with balls 
fully contained in the sample volume, (...)p means aver- 
aging by the sample points. 

Given a sample of arbitrary geometry and a scale r 
at which correlations are measured, only a subsamplc of 
the points contained in it will satisfy the following re- 
quirement: when chosen as center of a sphere of radius r, 
the sphere is fully contained in the sample volume. When 
the average in Eq|§| is made over such a subsample one is 
considering the full-shell estimator of the conditional den- 
sity. Note that the number of center N c (r) is a function 
of the scale r at which correlations are estimated. In fact 
for scales much smaller than the radius r" 1 of the largest 
sphere fully contained in the sample volume, almost all 
points will contribute to the average, while at scales com- 
parable to the sample size only those points lying in the 
center of the sample volume will contribute. Thus finite- 
size effect can be important when one considers the largest 
available scales: in this situation one cannot make a full 
volume average and systematic effect, due to large fluctu- 
ations, can be important in the determination of such a 
statistics. 

The scale r™ will in general be very different from the 
scales r m i n and r max characterizing a VL sample, as it 
depends crucially on the sample solid angle. On the other 
hand the minimal scale r sep up to which correlations can 
be measured, is given by the average distance between 
neighbor galaxies: clearly for r < r sep discrete shot-noise 
dominates estimations of any statistical quantity. Thus we 
will explicitly compute the scales r sep and r™ for the VL 
considered in what follows (see Tab|2J). 
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4.2. Conditional density in shells T(r) 
The conditional density in spherical shells is defined as 
(N(r, Ar)) P 



r(r) 



||C(r,Ar)|| 



(10) 



where (N(r,Ar))p represents the ensemble average num- 
ber of points in a sphere of radius r and thickness Ar, of 
volume \\C(r, Ar)\\ = §7r[(r + Ar) 3 — r 3 ], around a point 
of distribution (and thus this is a conditional ensemble 
average (...)p as in the previous case). Note that one can 
also write Eq^| as 



VL sample 


r sep , 2dFGRS 


r sep , Poisson 


' S 


NGP250 


1.3 


2.0 


12.4 


NGP400 


1.7 


2.6 


19.9 


NGP550 


2.7 


3.9 


27.4 


SGP250 


1.5 


2.4 


18.2 


SGP400 


1.9 


3.0 


29.1 


SGP550 


2.8 


4.2 


40.0 



r(r) 



(n(r)n(O)) 
<n(0)> 



(11) 



Table 2. Characteristic scales of the VL samples: r sep is 
the average separation distance between nearest neighbor 
galaxies (in 2dFGRS and Poisson distribution within the 
same volume and for the same number of galaxies), r™ is 
the maximum sphere completely contained in the sample. 
All distances are in Mpc (Hq = 100 km/sec/Mpc). 



where (...) represents the (unconditional) ensemble aver- 
age and n(r) is the microscopic number density. 

The conditional density in shells can be estimated in 
a finite sample by the following volume average 



r B (r) 



N(r, Ar)j 



1 



\C(r,Ar)\\ N c (r + Ar) 



N c (r+Ar) 

Nj(r, Ar 



\C(r, Ar 



where we consider again only the full-shell estimator, i.e. 
for which N c (r+ Ar) represents the number of points (cen- 
ters) contained in spherical shells fully contained in the 
sample volume. Analogously to the case of T*(r) particu- 
lar care should be used to determine the scales r sep and 

r m 
s ' 

It is instructive to notice that for the case where the 
distributions has power-law correlations and strong fluctu- 
ations (e.g. a fractal structure) then the conditional den- 
sity in spheres behaves (in the ensemble average) as 




r» 



3_B 

47T 



while conditional density in shells has the form 

(3- 7 )B _ 



I\r) 



47T 



(13) 



(14) 



where 7 is the correlation exponent (in the case of a frac- 
tal D — 3 — 7 is the fractal dimension) and B is a lower 
cut-off related to the smaller scale where correlation can 
be measured in a finite sample (i.e. to r sep previously de- 
fined). 

4.3. Application to 2dFGRS data 

In Tab|2 we show, for the different VL samples consid- 
ered, the lower and upper cut-off, previously discussed, 
between which we have estimated r(r) and T*(r). Note 
that we have generated a Poisson distribution, for each 
VL sample, with the same number of points and in the 
same three dimensional volume in order to estimate the 
same statistical quantities in a distribution without corre- 
lation at all. This provide us with a useful way to test our 
analysis with the simplest distribution with known prop- 
erties. Note also that all our estimates have been done in 



r (Mpc/h) 

Fig. 4. Estimation of the conditional density in spheres 
in the six VL samples considered (different symbols corre- 
spond to different VL samples — see labels). The reference 
line has a power-law behavior with slope 7 = 0.8. 



redshift space: the relation with real space properties will 
be discussed in Sec. 6. 

Fig0|shows the behavior of the estimation of the condi- 
tional density in spheres in the six VL samples considered. 
It is interesting to note that samples with same luminos- 
ity and distance cuts in the NGP and SGP show approx- 
imately the same behavior. However a difference in the 
amplitude is present for all but the largest sample. The 
amplitude of T* (r) is related to the luminosity function in 
the following way. 

In general one has that the joint conditional probabil- 
ity of finding a galaxy of luminosity L at distance r from 
another galaxy, i.e. the (ensemble) conditional average 
number of galaxies with luminosity in the range [L, L+dL] 
and in the volume element d 3 r at distance r from an ob- 
server located on a galaxy is given by {i>{L,r)) p d?rdL. 
One can then assume that 



(v(L,r)} p = <p(L)xT(r) 



(15) 
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Fig. 5. Conditional density is spheres normalized to the 
value at 10 Mpc/h In this way it is apparent the fact that 
the slope variates in the different samples. The reference 
line has a power-law behavior with slope 7 = 0.8. 

where T(r) is the average conditional density and <p(L) is 
the luminosity function such that <p(L)dL gives the prob- 
ability that a randomly chosen galaxy has luminosity in 
the range [L, L + dL]. By writing Eq^]as a product of the 
conditional space density for the luminosity function, one 
has implicitly assumed that galaxy positions are indepen- 
dent of galaxy luminosity. Thus from Ea 1 151 it follows that 
the amplitude of T* (r) in a VL sample is given by an inte- 
gral of the luminosity function over the range of absolute 
luminosity covered by the sample multiplied by the con- 
ditional density for all galaxies. Amplitude variations in 
the same VL samples in the NGP and SGP can be due to 
large local fluctuations which are not averaged out by the 
volume average. Thus these differences can be probably 
ascribed to finite size effects. The fact that in the deep- 
est VL samples (i.e. the ones cut at 550 Mpc/h), where 
the volume is the largest, the conditional density does not 
show significant differences between the two hemispheres 
supports the finite-size interpretation. 

If one fits the behavior of the estimated T*(r) with 
a power-law function of the type _Br~ 7 one finds that 
7 = 0.8 ±0.2. In FigElwe have normalized the conditional 
density is spheres to the value at 10 Mpc/h. In this way it 
is apparent the fact that the slope variates in the different 
samples: the variation is of about 0.1. The formal statisti- 
cal error for the determination of F* (r) at each scale can 
be simply derived from the dispersion of the average 



E 2 (r) 



1 

N 



N- 



E 



!(r*(r)-r*(r)) 2 



N- 1 



(16) 



where T*(r) represents the determination from the 
point. The corresponding error bars are too small to be 
plotted. However one should notice that at large scales 
(usually the last few points) estimators of F* (r) have large 
scatterings because of the small number of points con- 



Fig. 6. Estimation of the conditional density in shells for 
the different VL samples considered. The reference line 
has a power-law behavior with slope 7 = 0.7. 




10 10 
r (Mpc/h) 



Fig. 7. Estimation of the conditional density in shells nor- 
malized to the value at 10 Mpc/h for the different VL 
samples considered. The reference line has a power-law 
behavior with slope 7 = 0.7. 



tributing to the average. Moreover in this estimation one 
cannot take into account systematic variations due to the 
fact that the volume average cannot be performed at large 
scales (see discussion in Joyce, Montuori & Sylos Labini 
1999). For these reasons the behavior for scale larger than 
~ 20 Mpc/h is affected by large un-averaged fluctuations. 

We show in FigHJthe behavior of the conditional den- 
sity in shells and in Fig[7|the conditional density in shells 
normalized to the value at 10 Mpc/h. It is clear that these 
estimations are more affected by statistical noise. An im- 
portant parameter in this respect is represented by the 
shell thickness which we take constant in logarithmic scale. 
In this case the average slope is 7 = 0.8 ± 0.2 up to 30 
Mpc/h. 
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5. Estimation of the reduced two-point 
correlation function 

The reduced two-point correlation function £(r) for a 
stochastic point process is defined (see e.g. Peebles, 1980) 
as 



C(r) = 



(n(r)n(O)) 
MO)) 2 



- 1 = 



(n) 



- 1, 



(17) 



where (...) indicates the ensemble average and (n) — is 
the ensemble average number density. The last equality 
follows from the very definition of the conditional density 
(see EqEnj). 

There are several estimators of £(r) and we refer to 
Kerscher, Szapudi & Szalay (2000) and Gabrielli et al. 
(2004) for a detailed discussion of the different ones used 
in the literature. One may consider, for example, the 
Landy & Szalay (1993) (LS) estimator that is the most 
widespread in modern studies of correlation function for 
large scale structures because it is the minimal variance 
estimator for a Poisson distribution. This can be written 
as (Kerscher, Szapudi & Szalay al. 2000): 



&s(r) 



N R (N R - 1) DD(r) _ 2 N R -lDR(r) { ^ 



N D {N D - 1) RR{r 



N 



D 



RR(r) 



where Njj — the number of data (sample) points; N R — 
the number of random points homogeneously distributed 
in the sample geometry; DD(r) is the number data-data 
pairs, DR{r) — data-random pairs and RR(r) — random- 
random pairs respectively. Note that in the artificial ran- 
dom catalogs generated for the estimation of Eq^| we 
have used a number points in the range 4.5-910 4 . However 
the LS estimator can be biased by finite-size effects in the 
case of strongly correlated distributions as we discuss in 
what follows:we have tested that also for the estimator in- 
troduced by Davis & Peebles (1983) the situation is sub- 
stantially the same. 

Analogously to the full-shell estimator of the condi- 
tional density, one may define the following (full-shell) es- 
timator of £(r) which can be induced directly from Ea ll7l 



1 



(19) 



where r.E(r) is the estimator of the conditional density 
in shells and F* E (r™) is the estimator of the conditional 
density in spheres at the scale of the sample r™ . Although 
the latter quantity is not, in general, computed through 
an average because only a single point may contribute at 
such large scales, this estimator, when the properties of the 
distribution are unknown and likely to be characterized by 
strong fluctuations, has several advantages with respect to 
the LS (or also the one used by Davis & Peebles, 1983). 

We notice that by using the full-shell estimator we are 
able to make a very conservative measurement of the two- 
point correlation function. In fact, for example, one does 
not need to make estimations of correlations on scales 
larger than which require use of weighing schemes and 
special treatment of boundary conditions. The main point 




r (Mpc/h) 

Fig. 8. Estimation of the two-point reduced correlation 
function in the different VL samples considered by using 
the full shell estimator. The reference line has a power-law 
behavior with slope 7 = 0.75. 

is however that the estimation of the sample density is per- 
formed on "local" scales, i.e. much smaller than the global 
scale of the sample. In addition Eq^J satisfies the simple 
constraint 



^ FS {r)r 2 dr = 



(20) 



which is the so-called "integral constraint" . Any estima- 
tor of £(r) must satisfy a similar condition which comes 
from the fact that the average density has been estimated 
from the given sample. (Note that we do not use any ad- 
ditional correction to take into account for this particular 
effect: in the case of the full-shell estimator the integral 
constraint has a clear effect given by Ecil20l For the case 
of the LS estimator we have not used any correction to 
take into account for this constraint.) It is however clear 
that Eq[3U| gives us a simple way of controlling this offset, 
which is not the case for another estimator. It is impor- 
tant to stress that the estimation of the sample average 
is subjected to large fluctuations because its determina- 
tion does not involve any average. Such fluctuations will 
substantially alter the amplitude of the reduced correla- 
tion function as we discuss below: this is a good reason 
to measure statistical quantities, like T(r), which are not 
affected by such fluctuations. 

The behavior of £fs i r ) is presented on Fig. |H| We note 
two main properties: the first one is that the amplitude of 
£(r) changes in different samples and the second is that the 
exponent in the strongly clustered regime (i.e. £(r) 3> 1) 
is about 7 = 0.75. Both results are in qualitative agree- 
ment with other analysis of the same samples. For exam- 
ple Hawkins et al. (2003) found that in the full magnitude 
limited sample, the redshift space value of the correlation 
exponent is 7 = 0.75 in the range [0.1,4] Mpc/h and then 
7 = 1.75 in the range [4,10] Mpc/h (see their Fig.6). This 
is for example what we find in the SGP250 sample as 
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Fig. 9. Estimation of the two-point reduced correlation 
function in the VL sample SGP250, by using the full shell 
estimator. The reference lines have a power-law behavior 
with slope 7 = 0.75 and 7 = 1.57 respectively. 



Fig. 10. Estimation of the two-point reduced correlation 
function in the VL sample SGP400, by using the full shell 
estimator. The reference lines have a power-law behavior 
with slope 7 = 1.0 and 7 = 1.7 respectively. 



shown in Fig|5| It is worth noticing that the slopes mea- 
sured in different VL sample may variate as it is shown, 
for example, by the SGP400 in Fig llOl It is interesting to 
note that in other surveys different values of 7, in rcdshift 
space, have been found: for example in the CfAl cata- 
log 7 = 1.8 in the range [0.1,5] Mpc/h (Davis & Peebles 
1983). As discussed below we ascribe this change of slope, 
as the variation of the amplitude of £ (r) , to a finite size ef- 
fect. For this reason, while the qualitative behavior of the 
variation of the amplitude and exponent of £(r) is simi- 
lar to many other estimations (e.g. Hawkins et al., 2003; 
Norberg et al. 2001), the quantitative comparison depends 
on the sample size and, most importantly, on the fluctua- 
tions which affect the determination of the sample density. 



As these fluctuations can be large and dependent on the 
specific sample considered, it is difficult to make a more 
quantitative comparison between our and other results. 

It is also very interesting to note that the zero-crossing 
scale r zc of £(?"), shown by a sharp decay at the scale r zc of 
£s(r) in a log-log plot, depends on the sample size. This re- 
sult can be again explained as finite size effect introduced 
by EqEOl This is an important feature especially in the 
comparison between observations and numerical N-body 
simulations (see Sylos Labini 2005 for more detail). 

Concerning the amplitude, we note that Norberg et al. 
(2001) found a similar variation of the redshift-space 
This is consistent with the results discussed here. The dif- 
ference lies in the way these results are interpreted. In 
fact, while Norberg et al. (2001) ascribe the different am- 
plitudes to different selections in luminosity (or spectral 
type, or colors, etc.), we discuss below that, given the be- 
havior of the conditional density, such variations can be 
easily explained as a finite size effect. 

5.1. The role of finite size effects in redshift space 

In order to directly show the importance of finite size ef- 
fects, and illustrate their role in a specific example, we 
have considered the sample SGP400 and constructed some 
different subsamples. In all cases the other boundaries in 
a,5,r remain the same as for the original sample while an 
additional cut has been imposed. The sample CI is cut at 
r < 250 Mpc/h, C2 at r < 300 Mpc/h, C3 at r < 350 
Mpc/h, C4 at 5 < -0.5 radiant, C5 at S > -0.5 radi- 
ant, C6 at a > 3.3 radiant, C7 at a < 3.3 radiant and 
C8 at r < 315 Mpc/h. Note that in these subsamples the 
lower cut-off remains the same as for the full SGP400, 
while the upper cut-off changes: in what follows we focus 
on how the finite size effect at large scales influence the 
amplitude of the ^-function. The results obtained by the 
Landy-Szalay estimator fEall8[) are shown in FigllllOne 
may note that the amplitude of £.E(r) varies in the dif- 
ferent subsamples. Note that we refer to the amplitude 
variation of £(r) as shown by Fig llll without making a 
detailed analysis of the power-law exponent and the cor- 
responding amplitude). The reason for this choice lies in 
the insignificant values of formal statistical errors along 
with large systematical errors (especially at large scales) 
due to the finite volume and single realization. Instead 
of performing precise estimation of tq and 7 we simply 
demonstrate the general behavior of ^-function. This vari- 
ation is due to fluctuations in the large scale distribution 
of galaxies and thus they are volume dependent effects. 
Therefore the amplitude of £(r) is affected by finite-size 
effects as long as the distribution has not been found to 
have relaxed to an uniform system. From the one hand the 
Landy-Szalay estimator uses a sample density computed 
on the global sample scale, thus introducing a mixture of 
large scales and small scales properties in the measure of 
correlations. From the other hand, although the sample 
depth is of order of hundreds Mpc/h, finite size effects, re- 
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Fig. 11. Estimation of the two-point reduced correla- 
tion function in the different subsamples of the SGP400 
VL sample by the Landy-Szalay estimator fEa llSjl . The 
length-scale ro varies from 6.1 Mpc/h to 7.7 Mpc/h in the 
different samples. 

lated to the presence of large scale structures can be still 
important. The use of the conditional density avoids both 
these problems. 

In order to explain the amplitude and slope variation 
observed by the estimation of two-properties by £(r) we 
introduce a simple toy model which may capture the main 
element of the problem. However one may repeat the fol- 
lowing argument for any distribution, and thus for any 
functional behavior of the conditional density, one finds 
in the data. The point is that in the regime of strong clus- 
tering, evidenced by the range of scales where T(r) has 
not reached a clear flattening behavior, the determination 
of £(r) and thus of the average density, is sample size de- 
pendent. 

If the conditional density has a power-law behavior up 
to the size r™ of the type 



r(r) = Br~ 



(21) 



with < 7 < 3 then the estimation of the sample average 
through the conditional density in spheres is 



r*(C) 



47r(r™) 3 J 
Thus from EqOwe find that 



r(r)47rr 2 G?r 



3B 
~ 7 



- 1 



(C)"T.(22) 



(23) 



One may note that Eq|2Hl easily takes into account both 
the amplitude variation in samples of different size, and 
the change of the slope as a function of scale (due to the 
different regime of strong correlation where the fit with a 
power-law is possible). From Eq|53|one may note that the 
slope depends on scale in a continuous way: for example 
at r = ro such that £,Fs( r o) = 1 one easily derives that the 



local slope becomes 27 (see FigjSJ. In fact, Hawkins et al. 
(2003) fitted the slope around the scale ro in the different 
samples (see their Tab.l) with the consistent result that 
the slope is 1.6. 

Moreover we would like to remark the crucial point 
that rj;(r™) can differ from Eq[2]in a single sample de- 
termination: while the latter is the expectation value for 
the ensemble average quantity, the former quantity is sub- 
jected to large finite size fluctuations. This implies that the 
scaling of the amplitude of £,Fs( r ) does not hold precisely 
in a single measurement, while this is the expectation in 
an ensemble of realizations (which is not possible to obtain 
in the analysis of a single sample). 

5.2. The role of finite size effects in real space 

Concerning the real space properties, we have not directly 
measured them here. However we may notice that the 
same finite-size effects which perturb the redshift space 
reduced two-point correlation function may affect the pro- 
jected one (usually called ui(r p ) — see e.g. Davis & Peebles 
1983). In general, one may relate the real space £,rs(R) to 
the projected u)(r p ), where r p represents the projection of 
the redshift space distance on a direction perpendicular to 
the line of sight, through the following equation 



wf>p) = 2 



£,ns(y)y 



y 2 ~ r n 



dy 



(24) 



Let us now consider the following situation: if the real 
space conditional density has the behavior Trs{R) = 
AR~ 7 then we can repeat the argument which yields to 
EqEHlwith the result that 



^^(fr-i 



(25) 



where r™ is the sample depth, as discussed. Thus the real 
space £rs(R) shows the same finite-size effects present 
in the redshift space correlation function previously dis- 
cussed. If S,rs{R) has a pure power-law behavior with 
7 > 1 then from Eq[^]one gets 



,1-7 



(26) 



In the present situation this is not the case, because the 
second term in Eg 1251 gives an infinite contribution when 
integrated over all space. In practice however one trun- 
cates the integral to scales of order of r™ and one expects 
to recover Eq|23only at small enough scales. Thus 



uj{r p ) 



Us{y)y 



dy . 



(27) 



The finite size effect introduced by the cut-off r™ may 
well take into account the observed shape of ui{r p ). For 
example for 7 = 0.8 we get from Eg 1251 and Eq|23the be- 
havior shown in Fig ll2l which is very similar to the one 
measured by Hawkins et al. (2003). Hence, while in this 
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Fig. 12. Behavior of aj(r p )/r p computed numerically from 
Ea 1231 and EqElwith 7 = 0.8 and r™ = 70 Mpc/h. The 
dashed line has a slope of —1.5. 



r p in some finite 



example 7 < 1 we get that uj{r p 
range of scales: this is a finite size effect similar to the 
one varying the estimation of the exponent of the redshift 
space correlation function. Note that a similar finite-size 
effect may be present in the measurement of the angular 
two-point correlation function (see e.g., Montuori & Sylos 
Labini 1997). While we are clearly not able to make a 
definitive statement about whether the behavior of ui(r p ) 
found by, for instance, Hawkins et al. (2003), is perturbed 
by systematic biases, our analysis shows that one needs to 
consider finite size effects explicitly also for the computa- 
tion of the real space properties. 



6. Discussion 

We have studied redshift space correlation properties of six 
volume limited samples extracted from the 2dFGRS. We 
have considered several statistical properties. Particularly 
the characterization of small-scale properties through the 
nearest neighbor probability density allow us the determi- 
nation of the smallest scale up to which correlations prop- 
erties can be studied in a robust way. In fact, at scales 
smaller than the average distance between nearest neigh- 
bors, typically in the range of few Mpc/h (see TabJ2J|, 
discrete shot noise dominates the measurements leading 
to deviations from a power-law behavior. Whether the re- 
sult of Zehavi et al. (2004), who found departures from 
a power law behavior in the galaxy correlation function 
of some samples of the SDSS catalog can be interpreted 
in this way, i.e. as dominated by nearest-neighbor correla- 
tions, is an open question, as they did not mention what 
is the average distance between nearest neighbors in their 
sample, and then they performed the analysis in real space 
instead of redshift space as we do here. 

For the conditional average density we find that it 
is characterized fairly well by a power-law behavior in 



the range between 0.5 and 40 Mpc/h, where the expo- 
nent is7 = 0.8±0.2. This result is very robust at small 
scales (r <20 Mpc/h), as the volume average can be 
properly performed, and it becomes progressively weaker 
when the limits of the sample (set by the radius r™ 
of the largest sphere fully contained in it) are reached. 
Systematic noise, due to un-averaged large fluctuations, 
•™: one way to overcome this prob- 



■= increases when r 



lem is to consider larger samples. In this respect it is useful 
to compare our results with the ones derived by Hogg et 
al. (2005) by analyzing the largest sample ever studied for 
this correlation analysis. In fact they considered a sam- 
ple of luminous red galaxies, covering a volume of about 
~ 0.6 (Gpc/h) 3 . They found the same power-law as we 
find here up to 20/30 Mpc/h. They then detected a slow 
crossover toward homogeneity which is eventually reached 
at 70 Mpc/h. With the data we have considered here, due 
to the limited solid angle of the survey, we are not able to 
confirm or disprove this result. In this respect it is worth 
noticing that, for example, Sylos Labini et al. (1998) found 
a similar value for the redshift space correlation exponent 
for the conditional density at those scales: extending the 
analysis to larger scales, with statistical tests of weaker- 
robustness, they however found evidences for a continu- 
ation of correlations with almost the same exponent up 
to scales of order of one hundred Mpc/h. Apparently the 
results by Hogg et al. (2005) do not confirm such findings. 

Leaving the question of the extension of the power-law 
behavior to further studies, we focus now on the interpre- 
tation of small-scale correlations. Up to the scale of few 
tens Mpc/h, the conditional density T(r) show a power- 
law behavior, with exponent 7 = 0.8±0.2 and well defined 
amplitude, although with some fluctuations in different 
sky regions. As discussed, the amplitude of the conditional 
density varies in different VL samples according to the lu- 
minosity of the galaxies selected. This has a very simple 
explanation, that brighter galaxies are less frequent than 
fainter ones. One can develop an analytical formalism by 
considering the effect of the galaxy luminosity function to 
understand this change: in the hypothesis that space and 
luminosity are not correlated, usually adopted in studies 
of large scale galaxy distribution, one can quantitatively 
compute the amplitude of the conditional density in dif- 
ferent samples. 

We have discussed that the results we get for the re- 
duced two-point correlation function, although in agree- 
ment with the ones obtained by other groups, are affected 
by finite size effects. The reason is simply that as long as 
the distribution presents strong fluctuations, the study of 
£(r) is problematic. The regime of strong fluctuations is 
described by a certain functional behavior of the condi- 
tional density T(r), in the present case a power-law func- 
tion. In this situation the estimation of the sample density 
is not only affected by large (statistical) noise, but it be- 
comes sample size dependent, i.e. by a systematic effect. 
However because of the intrinsic large fluctuations system- 
atic and statistical noise are entangled into the informa- 
tion provided by the amplitude of £(r). Thus explicit tests 
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for systematic finite size effects are needed, and these are 
provided by the analysis of the conditional density. 
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